Pesquisa | Portal Regional da BVS

1.

Decoding speech sounds from neurophysiological data: Practical considerations and theoretical implications.

Sarrett, McCall E; Toscano, Joseph C.

Psychophysiology ; 61(4): e14475, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37947235

RESUMO

Machine learning techniques have proven to be a useful tool in cognitive neuroscience. However, their implementation in scalp-recorded electroencephalography (EEG) is relatively limited. To address this, we present three analyses using data from a previous study that examined event-related potential (ERP) responses to a wide range of naturally-produced speech sounds. First, we explore which features of the EEG signal best maximize machine learning accuracy for a voicing distinction, using a support vector machine (SVM). We manipulate three dimensions of the EEG signal as input to the SVM: number of trials averaged, number of time points averaged, and polynomial fit. We discuss the trade-offs in using different feature sets and offer some recommendations for researchers using machine learning. Next, we use SVMs to classify specific pairs of phonemes, finding that we can detect differences in the EEG signal that are not otherwise detectable using conventional ERP analyses. Finally, we characterize the timecourse of phonetic feature decoding across three phonological dimensions (voicing, manner of articulation, and place of articulation), and find that voicing and manner are decodable from neural activity, whereas place of articulation is not. This set of analyses addresses both practical considerations in the application of machine learning to EEG, particularly for speech studies, and also sheds light on current issues regarding the nature of perceptual representations of speech.

Assuntos

Fonética , Percepção da Fala , Humanos , Percepção da Fala/fisiologia , Fala/fisiologia , Potenciais Evocados , Eletroencefalografia/métodos

2.

Effects of experience on recognition of speech produced with a face mask.

Crinnion, Anne Marie; Toscano, Joseph C; Toscano, Cheyenne M.

Cogn Res Princ Implic ; 7(1): 46, 2022 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-35616742

RESUMO

Over the past two years, face masks have been a critical tool for preventing the spread of COVID-19. While previous studies have examined the effects of masks on speech recognition, much of this work was conducted early in the pandemic. Given that human listeners are able to adapt to a wide variety of novel contexts in speech perception, an open question concerns the extent to which listeners have adapted to masked speech during the pandemic. In order to evaluate this, we replicated Toscano and Toscano (PLOS ONE 16(2):e0246842, 2021), looking at the effects of several types of face masks on speech recognition in different levels of multi-talker babble noise. We also examined the effects of listeners' self-reported frequency of encounters with masked speech and the effects of the implementation of public mask mandates on speech recognition. Overall, we found that listeners' performance in the current experiment (with data collected in 2021) was similar to that of listeners in Toscano and Toscano (with data collected in 2020) and that performance did not differ based on mask experience. These findings suggest that listeners may have already adapted to masked speech by the time data were collected in 2020, are unable to adapt to masked speech, require additional context to be able to adapt, or that talkers also changed their productions over time. Implications for theories of perceptual learning in speech are discussed.

Assuntos

COVID-19 , Percepção da Fala , Humanos , Máscaras , Ruído , Fala

3.

Rethinking the McGurk effect as a perceptual illusion.

Getz, Laura M; Toscano, Joseph C.

Atten Percept Psychophys ; 83(6): 2583-2598, 2021 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-33884572

RESUMO

Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.

Assuntos

Ilusões , Percepção da Fala , Percepção Auditiva , Humanos , Fala , Percepção Visual

4.

Effects of face masks on speech recognition in multi-talker babble noise.

Toscano, Joseph C; Toscano, Cheyenne M.

PLoS One ; 16(2): e0246842, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33626073

RESUMO

Face masks are an important tool for preventing the spread of COVID-19. However, it is unclear how different types of masks affect speech recognition in different levels of background noise. To address this, we investigated the effects of four masks (a surgical mask, N95 respirator, and two cloth masks) on recognition of spoken sentences in multi-talker babble. In low levels of background noise, masks had little to no effect, with no more than a 5.5% decrease in mean accuracy compared to a no-mask condition. In high levels of noise, mean accuracy was 2.8-18.2% lower than the no-mask condition, but the surgical mask continued to show no significant difference. The results demonstrate that different types of masks generally yield similar accuracy in low levels of background noise, but differences between masks become more apparent in high levels of noise.

Assuntos

Percepção Auditiva/fisiologia , Máscaras , Percepção da Fala/fisiologia , Adulto , COVID-19/prevenção & controle , COVID-19/psicologia , COVID-19/transmissão , Feminino , Humanos , Idioma , Masculino , Máscaras/efeitos adversos , Respiradores N95/efeitos adversos , Ruído , SARS-CoV-2/isolamento & purificação , Fala/fisiologia

5.

The time-course of speech perception revealed by temporally-sensitive neural measures.

Getz, Laura M; Toscano, Joseph C.

Wiley Interdiscip Rev Cogn Sci ; 12(2): e1541, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32767836

RESUMO

Recent advances in cognitive neuroscience have provided a detailed picture of the early time-course of speech perception. In this review, we highlight this work, placing it within the broader context of research on the neurobiology of speech processing, and discuss how these data point us toward new models of speech perception and spoken language comprehension. We focus, in particular, on temporally-sensitive measures that allow us to directly measure early perceptual processes. Overall, the data provide support for two key principles: (a) speech perception is based on gradient representations of speech sounds and (b) speech perception is interactive and receives input from higher-level linguistic context at the earliest stages of cortical processing. Implications for models of speech processing and the neurobiology of language more broadly are discussed. This article is categorized under: Psychology > Language Psychology > Perception and Psychophysics Neuroscience > Cognition.

Assuntos

Neurociência Cognitiva , Compreensão , Idioma , Fonética , Percepção da Fala/fisiologia , Potenciais Evocados/fisiologia , Humanos , Imageamento por Ressonância Magnética

6.

A graph-theoretic approach to identifying acoustic cues for speech sound categorization.

Crinnion, Anne Marie; Malmskog, Beth; Toscano, Joseph C.

Psychon Bull Rev ; 27(6): 1104-1125, 2020 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-32671571

RESUMO

Human speech contains a wide variety of acoustic cues that listeners must map onto distinct phoneme categories. The large amount of information contained in these cues contributes to listeners' remarkable ability to accurately recognize speech across a variety of contexts. However, these cues vary across talkers, both in terms of how specific cue values map onto different phonemes and in terms of which cues individual talkers use most consistently to signal specific phonological contrasts. This creates a challenge for models that aim to characterize the information used to recognize speech. How do we balance the need to account for variability in speech sounds across a wide range of talkers with the need to avoid overspecifying which acoustic cues describe the mapping from speech sounds onto phonological distinctions? We present an approach using tools from graph theory that addresses this issue by creating networks describing connections between individual talkers and acoustic cues and by identifying subgraphs within these networks. This allows us to reduce the space of possible acoustic cues that signal a given phoneme to a subset that still accounts for variability across talkers, simplifying the model and providing insights into which cues are most relevant for specific phonemes. Classifiers trained on the subset of cue dimensions identified in the subgraphs provide fits to listeners' categorization that are similar to those obtained for classifiers trained on all cue dimensions, demonstrating that the subgraphs capture the cues necessary to categorize speech sounds.

Assuntos

Sinais (Psicologia) , Modelos Teóricos , Acústica da Fala , Percepção da Fala/fisiologia , Humanos

7.

Electrophysiological Evidence for Top-Down Lexical Influences on Early Speech Perception.

Getz, Laura M; Toscano, Joseph C.

Psychol Sci ; 30(6): 830-841, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31018103

RESUMO

An unresolved issue in speech perception concerns whether top-down linguistic information influences perceptual responses. We addressed this issue using the event-related-potential technique in two experiments that measured cross-modal sequential-semantic priming effects on the auditory N1, an index of acoustic-cue encoding. Participants heard auditory targets (e.g., "potatoes") following associated visual primes (e.g., "MASHED"), neutral visual primes (e.g., "FACE"), or a visual mask (e.g., "XXXX"). Auditory targets began with voiced (/b/, /d/, /g/) or voiceless (/p/, /t/, /k/) stop consonants, an acoustic difference known to yield differences in N1 amplitude. In Experiment 1 (N = 21), semantic context modulated responses to upcoming targets, with smaller N1 amplitudes for semantic associates. In Experiment 2 (N = 29), semantic context changed how listeners encoded sounds: Ambiguous voice-onset times were encoded similarly to the voicing end point elicited by semantic associates. These results are consistent with an interactive model of spoken-word recognition that includes top-down effects on early perception.

Assuntos

Percepção Auditiva/fisiologia , Semântica , Percepção da Fala/fisiologia , Fenômenos Eletrofisiológicos , Potenciais Evocados , Feminino , Humanos , Masculino , Modelos Neurológicos , Fonética , Tempo de Reação , Adulto Jovem

8.

Age-Related Changes in Temporal and Spectral Cue Weights in Speech.

Toscano, Joseph C; Lansing, Charissa R.

Lang Speech ; 62(1): 61-79, 2019 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-29103359

RESUMO

Listeners weight acoustic cues in speech according to their reliability, but few studies have examined how cue weights change across the lifespan. Previous work has suggested that older adults have deficits in auditory temporal discrimination, which could affect the reliability of temporal phonetic cues, such as voice onset time (VOT), and in turn, impact speech perception in real-world listening environments. We addressed this by examining younger and older adults' use of VOT and onset F0 (a secondary phonetic cue) for voicing judgments (e.g., /b/ vs. /p/), using both synthetic and naturally produced speech. We found age-related differences in listeners' use of the two voicing cues, such that older adults relied more heavily on onset F0 than younger adults, even though this cue is less reliable in American English. These results suggest that phonetic cue weights continue to change across the lifespan.

Assuntos

Envelhecimento/psicologia , Sinais (Psicologia) , Periodicidade , Acústica da Fala , Percepção da Fala , Qualidade da Voz , Adolescente , Adulto , Fatores Etários , Humanos , Julgamento , Pessoa de Meia-Idade , Fatores de Tempo , Adulto Jovem

9.

Reassessing the electrophysiological evidence for categorical perception of Mandarin lexical tone: ERP evidence from native and naïve non-native Mandarin listeners.

Gao, Yang Agnes; Toscano, Joseph C; Shih, Chilin; Tanner, Darren.

Atten Percept Psychophys ; 81(2): 543-557, 2019 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-30378083

RESUMO

Some studies have argued that native speakers of tonal languages have been shown to perceive lexical tone continua in a more categorical manner than speakers of non-tonal languages. Among these, Zhang and colleagues (NeuroReport 23 (1): 35-9) conducted an event-related potential (ERP) study using an oddball paradigm showing that native Mandarin speakers exhibit different sensitivity to deviant tones that cross category boundaries compared to deviants that belong to the same category as the standard. Other recent ERP findings examining consonant voicing categories question whether perception is truly categorical. The current study investigated these discrepant findings by replicating and extending the Zhang et al. study. Native Mandarin speakers and naïve English speakers performed an auditory oddball detection test while ERPs were recorded. Naïve English speakers were included to test for language experience effects. We found that Mandarin speakers and English speakers demonstrated qualitatively similar responses, in that both groups showed a larger N2 to the across-category deviant and a larger P3 to the within-category deviant. The N2/P3 pattern also did not differ in scalp topography for the within- versus across-category deviants, as was reported by Zhang et al. Cross-language differences surfaced in behavioral results, where Mandarin speakers showed better discrimination for the across-category deviant, but English speakers showed better discrimination for within-category deviants, though all results were near-ceiling. Our results therefore support models suggesting that listeners remain sensitive to gradient acoustic differences in speech even when they have learned phonological categories along an acoustic dimension.

Assuntos

Percepção Auditiva/fisiologia , Percepção da Altura Sonora/fisiologia , Percepção da Fala/fisiologia , Adulto , Potenciais Evocados/fisiologia , Feminino , Humanos , Linguística , Masculino , Adulto Jovem

10.

Perceptual Encoding in Auditory Brainstem Responses: Effects of Stimulus Frequency.

Tabachnick, Alexandra R; Toscano, Joseph C.

J Speech Lang Hear Res ; 61(9): 2364-2375, 2018 09 19.

Artigo em Inglês | MEDLINE | ID: mdl-30193361

RESUMO

Purpose: A central question about auditory perception concerns how acoustic information is represented at different stages of processing. The auditory brainstem response (ABR) provides a potentially useful index of the earliest stages of this process. However, it is unclear how basic acoustic characteristics (e.g., differences in tones spanning a wide range of frequencies) are indexed by ABR components. This study addresses this by investigating how ABR amplitude and latency track stimulus frequency for tones ranging from 250 to 8000 Hz. Method: In a repeated-measures experimental design, listeners were presented with brief tones (250, 500, 1000, 2000, 4000, and 8000 Hz) in random order while electroencephalography was recorded. ABR latencies and amplitudes for Wave V (6-9 ms) and in the time window following the Wave V peak (labeled as Wave VI; 9-12 ms) were measured. Results: Wave V latency decreased with increasing frequency, replicating previous work. In addition, Waves V and VI amplitudes tracked differences in tone frequency, with a nonlinear response from 250 to 8000 Hz and a clear log-linear response to tones from 500 to 8000 Hz. Conclusions: Results demonstrate that the ABR provides a useful measure of early perceptual encoding for stimuli varying in frequency and that the tonotopic organization of the auditory system is preserved at this stage of processing for stimuli from 500 to 8000 Hz. Such a measure may serve as a useful clinical tool for evaluating a listener's ability to encode specific frequencies in sounds. Supplemental Material: https://doi.org/10.23641/asha.6987422.

Assuntos

Estimulação Acústica/métodos , Percepção Auditiva/fisiologia , Potenciais Evocados Auditivos do Tronco Encefálico/fisiologia , Acústica , Adulto , Eletroencefalografia , Feminino , Humanos , Masculino , Adulto Jovem

11.

The time-course of cortical responses to speech revealed by fast optical imaging.

Toscano, Joseph C; Anderson, Nathaniel D; Fabiani, Monica; Gratton, Gabriele; Garnsey, Susan M.

Brain Lang ; 184: 32-42, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-29960165

RESUMO

Recent work has sought to describe the time-course of spoken word recognition, from initial acoustic cue encoding through lexical activation, and identify cortical areas involved in each stage of analysis. However, existing methods are limited in either temporal or spatial resolution, and as a result, have only provided partial answers to the question of how listeners encode acoustic information in speech. We present data from an experiment using a novel neuroimaging method, fast optical imaging, to directly assess the time-course of speech perception, providing non-invasive measurement of speech sound representations, localized to specific cortical areas. We find that listeners encode speech in terms of continuous acoustic cues at early stages of processing (ca. 96â¯ms post-stimulus onset), and begin activating phonological category representations rapidly (ca. 144â¯ms post-stimulus). Moreover, cue-based representations are widespread in the brain and overlap in time with graded category-based representations, suggesting that spoken word recognition involves simultaneous activation of both continuous acoustic cues and phonological categories.

Assuntos

Encéfalo/diagnóstico por imagem , Percepção da Fala/fisiologia , Fala/fisiologia , Adulto , Encéfalo/fisiologia , Sinais (Psicologia) , Eletroencefalografia , Feminino , Humanos , Masculino , Neuroimagem , Imagem Óptica , Fonética , Adulto Jovem

12.

Effects of Participant Engagement on Prosodic Prominence.

Buxó-Lugo, Andrés; Toscano, Joseph C; Watson, Duane G.

Discourse Process ; 55(3): 305-323, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-31097846

RESUMO

It is generally assumed that prosodic cues that provide linguistic information, like discourse status, are driven primarily by the information structure of the conversation. This article investigates whether speakers have the capacity to adjust subtle acoustic-phonetic properties of the prosodic signal when they find themselves in contexts in which accurate communication is important. Thus, we examine whether the communicative context, in addition to discourse structure, modulates prosodic choices when speakers produce acoustic prominence. We manipulated the discourse status of target words in the context of a highly communicative task (i.e., working with a partner to solve puzzles in the computer game Minecraft) and in the context of a less communicative task more typical of psycholinguistic experiments (i.e., picture description). Speakers in the more communicative task produced prosodic cues to discourse structure that were more discriminable than those in the less communicative task. In a second experiment, we found that the presence or absence of a conversational partner drove some, but not all, of these effects. Together, these results suggest that speakers can modulate the prosodic signal in response to the communicative and social context.

13.

Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C.

Brain Sci ; 7(3)2017 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-28335558

RESUMO

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

14.

The time-course of speaking rate compensation: Effects of sentential rate and vowel length on voicing judgments.

Toscano, Joseph C; McMurray, Bob.

Lang Cogn Neurosci ; 30(5): 529-543, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25780801

RESUMO

Many sources of context information in speech (such as speaking rate) occur either before or after the phonetic cues they influence, yet there is little work examining the time-course of these effects. Here, we investigate how listeners compensate for preceding sentence rate and subsequent vowel length (a secondary cue that has been used as a proxy for speaking rate) when categorizing words varying in voice-onset time (VOT). Participants selected visual objects in a display while their eye-movements were recorded, allowing us to examine when each source of information had an effect on lexical processing. We found that the effect of VOT preceded that of vowel length, suggesting that each cue is used as it becomes available. In a second experiment, we found that, in contrast, the effect of preceding sentence rate occurred simultaneously with VOT, suggesting that listeners interpret VOT relative to preceding rate.

15.

Across- and within-consonant errors for isolated syllables in noise.

Toscano, Joseph C; Allen, Jont B.

J Speech Lang Hear Res ; 57(6): 2293-307, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25087617

RESUMO

PURPOSE: A critical issue in assessing speech recognition involves understanding the factors that cause listeners to make errors. Models like the articulation index show that average error decreases logarithmically with increases in signal-to-noise ratio (SNR). The authors investigated (a) whether this log-linear relationship holds across consonants and for individual tokens and (b) what accounts for differences in error rates at the across- and within-consonant levels. METHOD: Listeners with normal hearing heard CV syllables (16 consonants and 4 vowels) spoken by 14 talkers, presented at 6 SNRs. Stimuli were presented randomly, and listeners indicated which syllable they heard. RESULTS: The log-linear relationship between error and SNR holds across consonants but breaks down at the token level. These 2 sources of variability (across- and within-consonant factors) explain the majority of listeners' errors. Moreover, simply adjusting for differences in token-level error thresholds explains 62% of the variability in listeners' responses. CONCLUSIONS: These results demonstrate that speech tests must control for the large variability among tokens, not average across them, as is commonly done in clinical practice. Accounting for token-level differences in error thresholds with listeners with normal hearing provides a basis for tests designed to diagnostically evaluate individual differences with listeners with hearing impairment.

Assuntos

Limiar Auditivo/fisiologia , Fonética , Razão Sinal-Ruído , Percepção da Fala , Estimulação Acústica/métodos , Adulto , Feminino , Humanos , Modelos Lineares , Masculino , Ruído , Adulto Jovem

16.

Reconsidering the role of temporal order in spoken word recognition.

Toscano, Joseph C; Anderson, Nathaniel D; McMurray, Bob.

Psychon Bull Rev ; 20(5): 981-7, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23456328

RESUMO

Models of spoken word recognition assume that words are represented as sequences of phonemes. We evaluated this assumption by examining phonemic anadromes, words that share the same phonemes but differ in their order (e.g., sub and bus). Using the visual-world paradigm, we found that listeners show more fixations to anadromes (e.g., sub when bus is the target) than to unrelated words (well) and to words that share the same vowel but not the same set of phonemes (sun). This contrasts with the predictions of existing models and suggests that words are not defined as strict sequences of phonemes.

Assuntos

Movimentos Oculares/fisiologia , Reconhecimento Psicológico/fisiologia , Percepção da Fala/fisiologia , Adulto , Medições dos Movimentos Oculares/instrumentação , Humanos , Psicolinguística/instrumentação , Psicolinguística/métodos , Fatores de Tempo , Adulto Jovem

17.

Cue-integration and context effects in speech: evidence against speaking-rate normalization.

Toscano, Joseph C; McMurray, Bob.

Atten Percept Psychophys ; 74(6): 1284-301, 2012 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-22532385

RESUMO

Listeners are able to accurately recognize speech despite variation in acoustic cues across contexts, such as different speaking rates. Previous work has suggested that listeners use rate information (indicated by vowel length; VL) to modify their use of context-dependent acoustic cues, like voice-onset time (VOT), a primary cue to voicing. We present several experiments and simulations that offer an alternative explanation: that listeners treat VL as a phonetic cue rather than as an indicator of speaking rate, and that they rely on general cue-integration principles to combine information from VOT and VL. We demonstrate that listeners use the two cues independently, that VL is used in both naturally produced and synthetic speech, and that the effects of stimulus naturalness can be explained by a cue-integration model. Together, these results suggest that listeners do not interpret VOT relative to rate information provided by VL and that the effects of speaking rate can be explained by more general cue-integration principles.

Assuntos

Auxiliares de Comunicação para Pessoas com Deficiência , Sinais (Psicologia) , Acústica da Fala , Percepção da Fala , Medida da Produção da Fala , Comportamento Verbal , Fixação Ocular , Humanos , Aprendizagem por Associação de Pares , Reconhecimento Visual de Modelos , Fonética , Movimentos Sacádicos , Semântica , Espectrografia do Som

18.

Continuous perception and graded categorization: electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech.

Toscano, Joseph C; McMurray, Bob; Dennhardt, Joel; Luck, Steven J.

Psychol Sci ; 21(10): 1532-40, 2010 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-20935168

RESUMO

Speech sounds are highly variable, yet listeners readily extract information from them and transform continuous acoustic signals into meaningful categories during language comprehension. A central question is whether perceptual encoding captures acoustic detail in a one-to-one fashion or whether it is affected by phonological categories. We addressed this question in an event-related potential (ERP) experiment in which listeners categorized spoken words that varied along a continuous acoustic dimension (voice-onset time, or VOT) in an auditory oddball task. We found that VOT effects were present through a late stage of perceptual processing (N1 component, ~100 ms poststimulus) and were independent of categorization. In addition, effects of within-category differences in VOT were present at a postperceptual categorization stage (P3 component, ~450 ms poststimulus). Thus, at perceptual levels, acoustic information is encoded continuously, independently of phonological information. Further, at phonological levels, fine-grained acoustic differences are preserved along with category information.

Assuntos

Atenção/fisiologia , Potenciais Evocados P300/fisiologia , Potenciais Evocados Auditivos/fisiologia , Modelos Lineares , Fonética , Acústica da Fala , Percepção da Fala/fisiologia , Adolescente , Adulto , Córtex Cerebral/fisiologia , Dominância Cerebral/fisiologia , Eletroencefalografia , Feminino , Humanos , Masculino , Semântica , Adulto Jovem

19.

Cue integration with categories: Weighting acoustic cues in speech using unsupervised learning and distributional statistics.

Toscano, Joseph C; McMurray, Bob.

Cogn Sci ; 34(3): 434-464, 2010 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-21339861

RESUMO

During speech perception, listeners make judgments about the phonological category of sounds by taking advantage of multiple acoustic cues for each phonological contrast. Perceptual experiments have shown that listeners weight these cues differently. How do listeners weight and combine acoustic cues to arrive at an overall estimate of the category for a speech sound? Here, we present several simulations using mixture of Gaussians (MOG) models that learn cue weights and combine cues on the basis of their distributional statistics. We show that a cue-weighting metric in which cues receive weight as a function of their reliability at distinguishing the phonological categories provides a good fit to the perceptual data obtained from human listeners, but only when these weights emerge through the dynamics of learning. These results suggest that cue weights can be readily extracted from the speech signal through unsupervised learning processes.

20.

Statistical learning of phonetic categories: insights from a computational approach.

McMurray, Bob; Aslin, Richard N; Toscano, Joseph C.

Dev Sci ; 12(3): 369-78, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-19371359

RESUMO

Recent evidence (Maye, Werker & Gerken, 2002) suggests that statistical learning may be an important mechanism for the acquisition of phonetic categories in the infant's native language. We examined the sufficiency of this hypothesis and its implications for development by implementing a statistical learning mechanism in a computational model based on a mixture of Gaussians (MOG) architecture. Statistical learning alone was found to be insufficient for phonetic category learning--an additional competition mechanism was required in order for the categories in the input to be successfully learnt. When competition was added to the MOG architecture, this class of models successfully accounted for developmental enhancement and loss of sensitivity to phonetic contrasts. Moreover, the MOG with competition model was used to explore a potentially important distributional property of early speech categories--sparseness--in which portions of the space between phonetic categories are unmapped. Sparseness was found in all successful models and quickly emerged during development even when the initial parameters favoured continuous representations with no gaps. The implications of these models for phonetic category learning in infants are discussed.

Assuntos

Biologia Computacional/métodos , Desenvolvimento da Linguagem , Aprendizagem , Modelos Psicológicos , Fonética , Algoritmos , Humanos , Lactente , Distribuição Normal , Estatística como Assunto

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA